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We consider the representational state complexity of unranked tree automata. The bottom-up com- 
putation of an unranked tree automaton may be either deterministic or nondeterministic, and further 
variants arise depending on whether the horizontal string languages defining the transitions are rep- 
resented by a DFA or an NFA. Also, we consider for unranked tree automata the alternative syntactic 
definition of determinism introduced by Cristau et al. (FCT'05, Lect. Notes Comput. Sci. 3623, 
pp. 68-79). We establish upper and lower bounds for the state complexity of conversions between 
different types of unranked tree automata. 
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1 Introduction 

Descriptional complexity, or state complexity, of finite automata has been extensively studied in recent 
years, see (8l[T0l[T6l[T3 and references listed there. On the other hand, very few papers explicitly discuss 
state complexity of tree automata. For classical tree automaton models operating on ranked trees ||6j [7) 
many state complexity results are similar to corresponding results on string automata. For example, it is 
well known that determinizing an n state nondeterministic bottom-up tree automaton gives an automaton 
with at most 2" states. 

Modern applications of tree automata, such as XML document processing lPT2l [T71 . use automata 
operating on unranked trees. One approach is to first encode the unranked trees as binary trees O. 
The other approach that we consider here is to define the computation of the tree automaton directly 
on unranked XML-trees flUElE]]- The set of transitions of an unranked tree automaton is, in general, 
infinite and the transitions are usually specified in terms of a regular language. Thus, in addition to the 
finite set of states used in the bottom-up computation, an unranked tree automaton needs for each state q 
and input symbol a a finite string automaton to recognize the horizontal language consisting of strings 
of states defining the transitions associated to q and a. 

Here we consider bottom-up (frontier-to-root) unranked tree automata. Roughly speaking, we get 
different models depending on whether the bottom-up computation is nondeterministic or deterministic 
and whether the horizontal languages are recognized by an NFA or a DFA ((non-)deterministic finite 
automaton). Furthermore, there is more than one way to define determinism for unranked tree automata 
and we compare here two of the variants. 

The more common definition J6j [T71 requires that for any input symbol a and two distinct states 
q\, q2, the horizontal languages associated, respectively, with q\ and a and with q2 and a are disjoint. 
The condition guarantees that the bottom-up computation assigns a unique state to each node. To distin- 
guish this from the syntactic definition of determinism of 0Q31, we call a deterministic tree automaton 
where the horizontal languages defining the transitions are specified by DFAs, a weakly deterministic 
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tree automaton. Note that a computation of a weakly deterministic automaton still needs to "choose" 
which of the DFAs (associated with different states) is used to process the sequence of states that the 
computation reached at the children of the current node - since the intersection of distinct horizontal lan- 
guages is empty the choice is unambiguous, however, when beginning to process the sequence of states 
the automaton has no way of knowing which DFA to use. 

A different definition, that we call strong determinism, was considered in BELUGA strongly deter- 
ministic automaton associates to each input symbol a single DFA H a equipped with an output function, 
the state at a parent node labeled by a is determined (via the output function) by the state H a reaches af- 
ter processing the sequence of states corresponding to the children. Strongly deterministic automata can 
be minimized efficiently and the minimal automaton is unique 0[T5l. On the other hand, interestingly it 
was shown in [1 1J that for weakly deterministic tree automata the minimization problem is NP-complete 
and the minimal automaton need not be unique. 

We study the state complexity of determinizing different variants of nondeterministic tree automata. 
That is, we develop upper and lower bounds for the size of deterministic tree automata that are equivalent 
to given nondeterministic automata. We define the size of an unranked tree automaton as a pair of integers 
consisting of the number of states used in the bottom-up computation, and the sum of the sizes of the 
NFAs definining the horizontal languages. Note that the two types of states play very different roles in 
computations of the tree automaton. The other possibility would be, as is done e.g. in [11], to count 
simply the total number of all states in the different components. 

Also, we study the state complexity of the conversions between the strongly and the weakly deter- 
ministic tree automata. Although the former model can be viewed to be more restricted, there exist tree 
languages for which the size of a strongly deterministic automaton is smaller than the size of the minimal 
weakly deterministic automaton. It turns out to be more difficult to establish lower bounds for the size of 
weakly deterministic automata than is the case for strongly deterministic automata. Naturally, this can 
be expected due to the intractability of the minimization of weakly deterministic automata ifTTIl . 

It should be noted that there are many other deterministic automaton models used for applications on 
unranked trees, such as stepwise tree automata OH and nested word automata EEl. Size comparisons 
between, respectively, stepwise tree automata and strongly deterministic automata or automata operating 
on binary encodings of unranked trees can be found in ifTTl . Much work remains to be done on state 
complexity of tree automata. 

To conclude we summarize the contents of the paper. In Section [2] we recall definitions for tree au- 
tomata operating on unranked trees and introduce some notation. In Section [3] we study the descriptional 
complexity of conversions between the strongly and the weakly deterministic tree automata, and in Sec- 
tion [4] we study the size blow-up of converting different variants of nondeterministic tree automata to 
strongly and weakly deterministic automata, respectively. Many of the proofs have been omitted in this 
extended abstract for the DCFS proceedings. 

2 Preliminaries 

We assume that the reader is familiar with the basics of formal languages and finite automata 1181 . 
Below we briefly recall some definitions for tree automata operating on unranked trees and fix notations. 
More details on unranked tree automata and references can be found in (6) [TTl . A general reference on 
tree automata operating on ranked trees is Q. 



The paper [5 1 refers to weak and strong determinism, respectively, as semantic and syntactic determinism. 
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Basic notions concerning trees, such as the root, a leaf, a subtree, the height of a tree and children of 
a node are assumed to be known. The set of non-negative integers is IN. A tree domain is a prefix-closed 
subset D of IN* such that if ui G D, u G IN* , i G IN then uj G D for all j < i. The set of nodes of a tree f is 
represented in the well-known way as a tree domain dom(?) and the node labeling is given by a mapping 
dom(?) — > £ where £ is a finite alphabet of symbols. Thus, we use labeled ordered unranked trees. Each 
node of a tree has a finite number of children with a linear order, but there is no a priori upper bound on 
the number of children of a node. The set of all Z-labeled trees is Tz- 

We introduce the following notation for trees. For i > 0, a G E and t G Tz, we denote by a'(t) = 
a(a{...a(t)...)) a tree, where the nodes e, 1, . . . , 1 are labelled by a and the subtree at node V is t. 
When a G £, w = b\b2...b n G £*, b[ G £, 1 < i < n, we use a(w) to denote the tree a(b\,b2, —,b n ). When 
L is a set of strings, a(L) = {a(w) \ w G L}. The set of all Z-trees where exactly one leaf is labelled by a 
special symbol x (i ^ I) is Tz[x}. For t G Tz[x] and t' G T^, t (x <— t') denotes the tree obtained from t by 
replacing the unique occurrence of variable x by t' . 

A nondeterministic (unranked) tree automaton (NTA) is a tuple A = (Q,1Z,8,F), where Q is the 
finite set of states, £ is the alphabet labeling nodes of input trees, F C Q is the set of final states, and 8 
is a mapping from Q x £ to the subsets of Q* which satisfies the condition that, for each q G Q, o G £, 
8(q, a) is a regular language. The language 8(q, a) is called the horizontal language associated with q 
and a. 

A computation of A on a tree t G Tz is a mapping C : dom(7) — >• Q such that for m G dom(f), if 
«• 1, . . .u-m, m > 0, are the children of w then C(m • 1) • • -C(m • m) G 5(C(w),f(w)). In case u is a leaf the 
condition means that m = and £ G 8(C(u),t(u)). 

Intuitively, if a computation of A has reached the children of a a-labelled node u in a sequence of 
states qi,q2,... ,q m , the computation may nondeterministically assign a state q to the node w provided 
that qiq2 • • • <7, M G 5 (g, a). For t G Tj;, r 4 C Q denotes the set of states that in some bottom-up computation 
A may reach at the root of t. The tree language recognized by A is defined as L(A) = {t G Tz \ t A (IF ^ 0}. 

For a tree automaton A = (Q,L,8,F), we denote by H^ a , q G Q, (J G £, a nondeterministic finite 
automaton (NFA) on strings recognizing the horizontal language S(q,o). The NFA H^ a is called a 
horizontal automaton, and states of different horizontal automata are called collectively horizontal states. 
We refer to the states of Q that are used in the bottom-up computation as vertical states. 

A tree automaton A = (Q,L,8,F) is said to be (semantically) deterministic (aDTA) if for a G £ and 
any two states qi ^ q2, 8(qi,o) D 8(q2,o) = 0. 

We get a further refinement of classes of automata depending on whether the horizontal languages 
are defined using DFAs or NFAs. We use NTA(M) or DTA(M), respectively, to denote (the class of) 
nondeterministic or deterministic tree automata where the horizontal languages are specified by the el- 
ements in class M. For example, NTA(DFA) denotes the tree automata where the horizontal languages 
are recognized by a DFA. 

Note that when referring to a tree automaton A = (<2,£, 8,F) it is always assumed that the relation 
8 is specified in terms of automata H^ a , q G Q, G G £, and by saying that A is an NTA(DFA) we 
indicate that each H^ a is a DFA. We refer to DTA(DFA)'s also as weakly deterministic tree automata to 
distinguish them from the below notion of strong determinism. 

If A is a DTA(NFA), for any tree t G Tz the bottom-up computation of A assigns a unique vertical 
state to the root of t, that is, t A is a singleton set or empty. If the horizontal automata H A a are DFAs, 
furthermore, for each transition the sequence of horizontal states is processed deterministically. However, 
as discussed in Section [T] a computation that has reached children of a a-labeled node in a sequence of 
states w G Q* still needs to make the choice which of the DFAs H A a , q G Q, is used to process w. For 
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this reason we consider also the following notion introduced in [ 5 ] that we call strong determinism. 

A tree automaton A = (<2,E, 8,F) is said to be strongly deterministic if for each a G E, the transitions 
are defined by a single DFA augmented with an output function as follows. For a G E define 

Ha = (5a,fi,^,F a ,y a ,A ff ), (1) 

where (S a ,Q, s Q a ,F a ,y a ) is a DFA with set of states S a where s° G S CT is the start state, F CT C S a is 
the set of final states and y a : S a x Q ^ S a is the transition function, and X a is a function F CT — )• <2. 
Then we require that for all q G 2 and a G E: 8(q,a) = {w £ Q* \ X a (y a (s° a ,w)) = q}. Note that the 
definition guarantees that 8(qi , a) n 8 (q2 , d) = for any distinct <?i,<?2 G 2, a G E. The class of strongly 
deterministic tree automata is denoted as SDTA0 

By the size of an NFA B, denoted size(fi), we mean the number of states of B. Because the roles 
played by vertical and horizontal states, respectively, in the computations of a tree automaton are essen- 
tially different, when measuring the size of an automaton we count the two types of states separately. 
The size of an NTA(NFA) A = (g,E, 8,F) is defined as 

size(A) = [|<2|; £ size(^ ff )] (GlNxIN). 

qeQ,ael. 

Using notations of ([!]), the size of an SDTAA is defined as the pair of integers size(A) = [ \Q\; ^aei. \S a \ ]. 

We make the following notational convention that allows us to use symbols of E in the definition 
of horizontal languages. Unless otherwise mentioned, we assume that a tree automaton always assigns 
to each leaf symbol labeled a a state a that is not used anywhere else in the computation. That is, for 
a G E and q G Q, s G S(q,o) only if q = a, 8(a, a) = {s} and <5(t, a) = for all a, z G E, (t/t. 
When there is no confusion, we denote also a simply by a. When the alphabet E is fixed, there is only a 
constant number of the special states a and since, furthermore, the special states have the same function 
in all types of tree automata, for simplicity, we do not include them when counting the vertical states. The 
purpose of this convention is to improve readability: many of our constructions become more transparent 
when alphabet symbols can be used explicitly to define horizontal languages. The convention does not 
change our state complexity bounds that are generally given within a multiplicative constant. 

To conclude this section we give two lemmas that provide lower bound estimates for vertical and 
horizontal states of SDTAs, respectively. The lower bound condition for vertical states applies, more 
generally, for DTA(NFA)'s, however, obtaining lower bounds for the number of horizontal states of 
weakly deterministic automata turns out to be more problematic. 

Lemma 2.1 Let A be an SDTA or a DTA(NFA) with a set of vertical states Q recognizing a tree language 
L. Assume R = {t\,... ,t m } C Tz where for any 1 < i < j < m there exists t G Tz[x] such thatt{x t{) G L 
ifft(x <- tj) <£L. Then \Q\ > \R\ - 1. 

Lemma 2.2 Let A be an SDTA with a set of vertical states Q recognizing a tree language L. Let S be a 
finite set of tuples ofL-trees and let b G E. Assume that for any distinct tuples (r\ , . . . , r m ), (si,. . . ,s n ) G S 
there exists t G T^[x] and a sequence of trees u\ , . . . , Uk such that 

t(x^ b(r u ...,r m ,ui,...,u k )) GL iff t(x <- b(si, . . . ,j„,mi, . . . ,u k )) (2) 

Then the horizontal automaton needs at least \S\ — 1 states. 

2 Strictly speaking, 8 is superfluous in the tuple specifying an SDTA and the original definition of (3) gives instead the 
automata H^, o" e Z. We use 5 in order to make the notation compatible with our other models, and to avoid having to define 
bottom-up computations of SDTAs separately. 
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3 Size comparison of the strongly and weakly deterministic tree automata 

Here we give upper and lower bounds for the size of a weakly deterministic automaton (a DTA(DFA)) 
simulating a strongly deterministic one (an SDTA), and vice versa. The computation of a DTA(DFA) 
can, in some sense, nondeterministically choose which of the horizontal DFAs it uses at each transition. 
An SDTA does not have this capability and it can be expected that, in the worst case, an SDTA may 
need considerably more states than an equivalent DTA(DFA). However, there exist also tree languages 
for which an SDTA can be considerably more succinct than a DTA(DFA). 



3.1 Converting an SDTA to a DTA(DFA) 

We show that an SDTA can be quadratically smaller than a DTA(DFA). This can be compared with ifTTl 
where it was shown that deterministic stepwise tree automata can be quadratically smaller than SDTAs 
(that are called dPUTAs in ifTTl Y 

The upper bound for the conversion is expected but we include a short proof. In the below lemma 
(and afterwards) we use "<" to compare pairs of integers componentwise. As introduced in Section [2] 
for an SDTA A we denote the deterministic automata for the corresponsing horizontal languages by H^, 

oeL. 

Lemma 3.1 Let A = (Q,L, 8,F) be an arbitrary SDTA. 
We can construct an equivalent DTA(DFA) A' where 

size(A')< [|fi|; \Q\ x £ size(fl£) ]. (3) 



Proof. For a G E denote the components of as in M. Construct an equivalent DTA(DFA) A' = 
(Q,~L,8',F), where for each a G E, q G Q, 8'(q,o) = {w G Q* \ X a (y a (s% , w)) = q}. The languages 
8'(q\ , a) and 8'(qz, o),q\ ^ q2 are always disjoint, and 8'{q, a) is recognized by a DFA obtained from 
H£ by choosing as the set of final states X~ l (q), q G Q, G G E. The construction does not change the 
number of vertical states and ([3]) holds. ■ 

Next we give a lower bound for the conversion. 

Lemma 3.2 Let n,z G IN and choose E = {a,b,0, 1}. There exists an SDTA B with input alphabet E, n 
vertical states and z + 4n horizontal states, such that any DTA(DFA) for the tree language L(B) has at 
least n vertical states and n( [logn\ + 2 + z) horizontal states. 



Using Lemma 3.2 with z = n — \Vogn\ , we see that the upper bound of Lemma 3.1 is tight within a 



multiplicative constant. This is stated as: 

Theorem 3.1 An SDTA with n vertical and m horizontal states can be simulated by a DTA( DFA ) having 
n vertical and n ■ m horizontal states. 

For n>\, there exists a tree language L n recognized by an SDTA with n vertical and 0(n) horizontal 
states such that any DTA(DFA) recognizing L„ has n vertical and Q.(n 2 ) horizontal states. 



It can be viewed as expected that in the conversion of Theorem |3.1| fhe number of vertical states does 
not change. However as will be discussed later, in general, for a DTA(DFA) it may be possible to reduce 
the number of horizontal states by increasing the number of vertical states. 
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3.2 Converting a DTA(DFA) to an SDTA 

Again we give first an upper bound for the simulation. It is known from ( ifTTl Proposition 24) that the 
simulation does not increase the number of vertical states. 

Lemma 3.3 Let B = (Q,E, 5, F) be an arbitrary DTA(DFA), where \Q\ =n. Let H^ a = (Sq i(J ,Q,Sq a ,Fq^,Yq,a) 
be a DFAfor the horizontal language 8(q, o), q G Q, O G £. 
We can construct an equivalent SDTA B' where 

size(5')<[|e|; Ldld^l-I^D+Ll^l- II (\S P ,a\-\F p ,o\))) 

aeZ qeQ qeQ peQ,p^q 



If B has m horizontal states, Lemma 3.3 gives for the number of horizontal states of B' a worst-case 



upper bound that is less than 2 m but is not polynomial in m. Next we give a lower bound construction. 

Lemma 3.4 Let Z = {a,b,0, 1}. For any m G IN and relatively prime numbers 1 < k\ < k-i < ... < 
k m , there exists a tree language L over £ recognized by a DTA(DFA) B with size(fi) = [ m; Y4L1 k + 
O(mlogra) ] such that any SDTA recognizing L has at least m vertical states and YVJL^ki horizontal 
states. 

Proof. Let y; G {0, 1}* be the binary representation of i > 1. We define L = \J l<i<m a'((b ki )*yi). 

We define for L a DTA(DFA) B = (Q,L,5,F), where Q = {q u ...,q m }, F = {qi}, 8(a, qi ) = (b k <)* ■ 
yi + qi+\, for 1 <i<m—\, and 8(a,q m ) = (b m ) -y m - Note that the bottom-up computation of B is 
deterministic because different horizontal languages are marked by distinct binary strings y\. 

Each horizontal language (b ki ) -yt + qi+\ can be recognized by a DFA with k[ + [log /J + 3 states, 
and in total B has Y!iL\ h + L'-Ii ( W°% 'J ) + 3m horizontal states (and m vertical states). 

Let B' = (Q' ,L,8' ,F') be an arbitrary SDTA recognizing L. By choosing R = {a(b k 'yj) \ 1 < i < 



m} U {a(b)}, Lemma 2.1 gives \Q'\ >m. 

We show that the DFA H^' , with notations as in defining transitions corresponding to symbol a 
needs at least Ylf =1 kj states. Suppose that has less than rC'Ii h states. Then there exist < j < s < 
Yl'fLi ki such that H% reaches the same state after reading strings V and b s , respectively. There must exist 
1 < r < m such that k r does not divide s — j. Let z, = j + (k r — j mod k r ). Since H% reaches the same 
state on V and b s , it follows that reaches the same state also on b z - y r and b z+s ~i -y r , respectively. 
This means that a kr (b z y r ) is accepted by B' if and only if a k '~(b z+s ~j -y r ) is accepted by B', which is a 
contradiction because k r divides z and does not divide z + s — j. ■ 

In the above proof, using a more detailed analysis it could be shown that H% needs Q.{m ■ logm) 
additional states to process the strings y,-, however, this would not change the worst-case lower bound. 

Now we establish that the upper and lower bounds for the DTA(DFA)-to-SDTA conversion are within 
a multiplicative constant, at least when the sizes of the horizontal DFAs are large compared to the number 
of vertical states. 

Theorem 3.2 An arbitrary DTA(DFA) B = (<2,r, 8,F) has an equivalent SDTA B' with 

size(B')<[|£|; £n size (<r)], (4) 

and, for an arbitrary m>\ there exists a DTA(DFA) B = (Q,Y,,S,F) with \Q\ = m such that for any 
equivalent SDTA B' the size ofB' has a lower bound within a multiplicative constant o/Q. 
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Proof. The upper bound follows from Lemma 3.3 We get the lower bound from Lemma 3.4 by choosing 
each kj to be at least m • logra, i = 1 , . . . ,m. ■ 

We note that when converting a DTA(DFA) B = (Q,L, 8,F) to an equivalent SDTA A, for each a G £ 
the horizontal DFA needs at least as many states as a DFA recognizing Lb,<j = UqeQ ^(<7> a )- Note 
that from we obtain a DFA for Lb^ simply by ignoring the output function. However, needs to 
provide more detailed information for a given input string than a DFA simply recognizing Lb,o, and in 
fact H£ recognizes the marked union, as formalized below, of the languages 8(q, a). 

We say that a DFA A = (Q,L,sq,F, y) equipped with an output function X : F — > {1, . . . ,m} recog- 
nizes the marked union of pairwise disjoint regular languages L\, L m , if L, = {w G I* | A (y(so, w)) = 
/}, i = l,...,m. The following result establishes that the state complexity of marked union may be 
arbitrarily much larger than the state complexity of union. 

Proposition 1 Let A = (Q,Y*,sq,F,y,X) be a DFA with output function X : F — > { 1 , . . . m} that recognizes 
the marked union of disjoint languages L,-, i = 1 , . . . , m, and let B be the minimal DFA for U^i Li- 
Then size (A) > size(B), and for any m>\ there exist disjoint regular languages Lu 1 <i <m, such 
that size(B) = 1 and the size(A) > m. 

4 Converting nondeterministic tree automata to deterministic automata 

In this section we consider conversions of different variants of nondeterministic automata into equivalent 
strongly and weakly deterministic automata. 

4.1 Converting a nondeterministic automaton to an SDTA 

Lemma 4.1 Let A = (Q,L, 8,F) be an NTA(NFA) and for q€ Q, a G £ denote size(H^ a ) = m q ^ a . 

(i) We can construct an equivalent SDTA B where 

size(B) < [2l e l; £ 2^ ]. (5) 

(ii) If A is a DTA(NFA), in the upper bound (|5]) the number of vertical states is at most \Q\. 

We do not require the automaton to be complete and, naturally, in Q the number of vertical states of B 
could be reduced to 2^1 — 1. A similar small improvement could be made to the number of horizontal 
states, but it would make the formula look rather complicated. 



Also, in Lemma 4. 1 (ii) the upper bound for the number of horizontal states could be slightly reduced 



using a more detailed analysis, as in the proof of Lemma 3.3 that takes into account that, in no situation, 
two distinct NFAs defining the horizontal languages associated with a fixed input symbol a can accept 
simultaneously. 

Lemma |4.1| did not discuss the case where the bottom-up computation is nondeterministic but the 
horizontal languages are represented in terms of DFAs. We note that for an NTA(DFA) A = (Q,L,8,F) 



the construction used in the proof of Lemma 4. 1 gives for the size of an equivalent SDTA only the upper 
bound ([5]>. Although the horizontal languages of A are defined using DFAs, the horizontal languages of 
the equivalent SDTA B are over the alphabet 3?(Q), and this means that the upper bound for the number 
of horizontal states would not be improved. 
Next we state two lower bound results. 
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Lemma 4.2 LetL = {a,b}. For any relatively prime numbers m\,ni2,...,m n , there exists a tree language 

n 

L over E such that L is recognized by an NTA(DFA) A with size(A) < [n; ( £ m,) +2n — 2], and any 

n 

SDTAfor L needs at least 2" — 1 vertical states and ( Yl m i) ~ 1 horizontal states. 

i=\ 

Lemma 4.3 For n > 1, there exists a tree language L n recognized by a DTA(NFA) A with n vertical and 
less than nlogn horizontal states such that for any SDTA Bfor L„, size(fi) > [ n; 2" ]. 

The lower bounds given by the above two lemmas are far away from the corresponding upper 



bounds in Lemma 4. 1 Furthermore, we do not have a worst-case construction for general NTA(NFA)'s 
that would provably give an essentially better lower bound than the one obtained for NTA(DFA)'s in 
Lemma l4~2l 

4.2 Converting a nondeterministic automaton to a DTA(DFA) 

We begin with a simulation result establishing an upper bound. 

Lemma 4.4 Let A = (Q,L, 8,F) be an NTA(NFA) and for qe Q, a G Z denote size(H^ a ) = m q ^ a . 

(i) There exists a DTA(DFA) B equivalent to A where 

size(fi) < [2l fi l; 2^ • (£ 2^e'%-)) ]. (6) 

(ii) If A is a DTA(NFA), it has an equivalent DTA(DFA) B where 

size(B)<[|e|; £ £ 2 m «* ]. 

qeQoeZ 

Roughly speaking, the simulation uses a standard subset construction [18] for the set of vertical 
states, and in order to guarantee that the bottom-up computation remains deterministic the DFA for 
the horizontal language corresponding to P C Q, a € E, needs to simulate each horizontal NFA of A 
corresponding to o. In the case where A is an NTA(DFA) we do not have a significantly better bound 
than ([6]>, because the horizontal languages of the DTA(DFA) consist of strings of subsets of Q, which 
means that we again have to simulate multiple computations of each horizontal DFA of A. In the below 
lower bound construction of Theorem|4.1|we, in fact, use an NTA(DFA). 



We do not have a lower bound that would match the bound of Lemma 4.4 Recall that strongly 
deterministic automata can be minimized efficiently and the minimal automaton is unique |5], however, 
minimal DTA(DFA)'s are, in general, not unique and minimization is intractable [TTJ. When trying 
to establish lower bounds for the size of a DTA(DFA) A = (Q,L,8,F) there is the difficulty that by 
adding more vertical states, and hence more horizontal languages, it may still be possible that the total 
number of horizontal states is reduced. For example, suppose that A has a horizontal language 8(q, a) = 
(a + b)*b(a + b) 7 , where the minimal DFA has 256 states^] This language can be represented as a disjoint 
union of 8 regular languages where the sum of the sizes of the minimal DFAs is only 176. Thus, by 
replacing the state q by 8 distinct vertical states (that could be equivalent in the bottom-up computation) 
we could reduce the size of A. 

In fact, we do not have a general lower bound condition, analogous to Lemma |2T2j for the number of 
horizontal states of DTA(DFA)'s and the below lower bound result relies on an ad hoc proof. 



3 Note that S(q, a) is a typical example of a language where the NFA-to-DFA size blow-up is large. 
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Let £ = {a,b}. Let p\, . . . , p n be the first n primes. Define the tree language 

T n = {a\b k ) | i > 1, & > 0, (31 < j < n)[jfc = (mod pj) and i = j (mod n)]}. 

Theorem 4.1 The tree language T n can be recognized by an NTA(DFA) A with size(A) = [ n; (£" =1 Pi) + 
2n }, and for any DTA(DFA) B recognizing T n , 



size(5)>[2"-l; (2» - 1) l\ Pi ]. 



Theorem 4. 1 gives a construction where converting an NTA(DFA) to a DTA(DFA) causes an exponential 
blow-up in the number of vertical states, and additionally the size of each of the (exponentially many) 
horizontal DFAs is considerably larger than the original DFA. However, the size blow-up of the horizontal 



DFAs does not match the upper bound of Lemma |4~4| In the proof of Theorem |4.1| roughly speaking, 
we use a particular type of unary horizontal languages in order to be able to (provably) establish that 
there cannot be a trade-off between the numbers of vertical and horizontal states, and with this type of 



constructions it seems difficult to approach the worst-case size blow-up of Lemma 4.4 



5 Conclusion 

We have studied the state complexity of conversions between different models of tree automata operating 
on unranked trees. For the conversion of weakly deterministic automata into strongly deterministic 
automata, and vice versa, we established lower bounds that are within a multiplicative constant of the 
corresponding upper bound. However, for the size blow-up of converting nondeterministic automata to 
(strongly and weakly) deterministic automata the upper and lower bounds remain far apart, and this is a 
topic for further research. 

Since a minimal weakly deterministic automaton need not be unique fTT). it is, in general, hard to 
establish lower bounds for the number of horizontal states of weakly deterministic automata and we do 



not have tools like Lemma 2.2 that is used for strongly deterministic automata. Weakly deterministic 
automata can have trade-offs between the numbers of vertical and horizontal states, respectively, and it 
would be useful to establish some upper bounds for how much the number of horizontal states can be 
reduced by introducing additional vertical states. 
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